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Abstract. We use the generic chaining device proposed by Talagrand 
to establish exponential bounds on the deviation probability of some 
suprema of random processes. Then, given a random vector ^ in R" the 
components of which are independent and admit a suitable exponential 
moment, we deduce a deviation inequality for the squared Euclidean 
norm of the projection of ^ onto a linear subspace of R". Finally, we 
provide an application of such an inequality to statistics, performing 
model selection in the regression setting when the errors are possibly 
non-Gaussian and the collection of models possibly large. 



1.1. Controlling suprema of random processes. Let (Xt)^^^ be real- 
valued and centered random variables indexed by a countable and nonempty 
set T and 



A central problem in Probability and Statistics is to provide a suitable con- 
trol of the probability of deviation of Z. When T is a (countable) bounded 
subset of a metric space {Xjd), a common technique is to use a chaining 
device. The basic idea is to decompose Xt into series of the form 



where Xt„ = a.s. and the (tfc)fc>i is sequence of elements of T converging 
towards t and such that for each fc, tk belongs to a suitable finite subset 
of T. Then, the control of supfg-j^Xt amounts to those of the increments 
■^tk+i ~ ^tfc simultaneously for all k and all pairs of elements G 
Tk X Tk+i which arc close. This approach seems to go back to Kolmogorov 
and was very popular in Statistics in the 90s to control suprema of empirical 
processes with regard to the entropy of T, see van de Geer (1990) and Barron 
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et al (1999) for example. However, this approach suffers from the drawback 
that it leads to pessimistic numerical constants that are in general too large 
to be used in statistical procedures. An alternative to chaining is the use 
of the concentration phenomenon of some probability measures such as the 
Gaussian distribution for instance. Indeed, when the Xt are Gaussian, for 
all u > we have 

(1) f(z>E{Z) + V2^) < e~" where v = supvar(Xt). 

This inequality is due to Sudakov &: Cirel'son (1974). A nice features of (1) 
lies in the fact that it allows to recover the usual deviation bound for Gauss- 
ian random variables when T reduces to a single clement. Compared to 
chaining. Inequality (1) provides a powerful tool for controlling suprema of 
Gaussian processes as soon as one is able to evaluate E(Z) sharply enough. 

It is the merit of Talagrand (1995) to extend this approach for the purpose of 
controlling suprema of empirical processes, that is, when Xf takes the form 
"^{=1 ti^i) — IE (i('^j)) with T a set of uniformly bounded functions and S,i in- 
dependent random variables. Yet, the original result by Talagrand involved 
suboptimal numerical constants and many efforts were made to recover it 
with sharper ones. A first step in this direction is due to Ledoux (1996) 
by mean of nice entropy and tensorisation arguments. Then, further re- 
finements were made on Ledoux's result by Massart (2000), Rio (2002) and 
Bousquet (2002), the latter author achieving the best possible result in terms 
of constants. Nowadays, these entropy arguments have become a popular 
way of establishing deviation and concentration inequalities for Z around 
its expectation. For a nice and complete introduction to these inequalities 
(and their apphcations to statistics) we refer the reader to the book by 
Massart (2007). 

Bousquet 's inequality can be recovered (with worse constants) by applying 
the following result of Klein & Rio (2005) (Theorem 1.1). Actually, we write 
it in a slightly different form with possibly larger constants. 

Theorem 1 (Klein &; Rio). For each teT, let (Xi t) ■_, be independent 
(but not necessarily i.i.d.) centered random variables with values in [— c, c] 
and set Xt = Xi^t. For all u>Q, 

(2) ¥(^Z> E(Z) + ^(2^2 + 2cE(Z)) u + 3cn) < exp {-u) 
where = sup^g-p var (Xf) . 

This inequality should be compared to Bernstein's inequality that we recall 
below (see also Massart (2007) for related conditions). Indeed, it can be 
shown that a sum X of independent centered random variables Xi = X^ 
with values in [— c, c] for i = l,...,n do satisfy the Condition (3) below 
with v'^ = var(X). Consequently, Inequality (2) generalizes Bernstein's 
(with worse constants) to suprema of countable families of such X. 
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Theorem 2 (Bernstein's inequality). Let Xi, . . . ,Xn be independent ran- 
dom variables and set X = 'YLl^=i{^i Assume that there exist 
nonnegative numbers v, c such that for all k >3 



\Xi 



(3) 

i=l 

Then, for all u>0 

(4) f(^X> V2v'^u + cuj < e-". 
Besides, for all x >0, 

( x^ 

(5) P (X > x) < exp -—^ ^ 

^ ' \ - ) - i:- y 2(^;2 + ex) 



In the literature, (3) together with the fact that the Xi are independent is 
sometime replaced by the weaker condition 



(6) 



E 



.AX 







) < exp 


2(1 - Ac) 



VA G (0,c). 



In this paper, we shall mainly deal with this type of assumption which has 
the advantage to depend on the law of X only. 

Looking at condition (6), a natural question arises. Is it possible to establish 
an analogue of Klein h Rio's result when one replaces the assumption that 
the Xi^t belong to [— c, c] by a suitable assumption on T and the Laplace 
transforms of the X^? An attempt at solving this problem can be found 
in Bousquet (2003). There, the author considered the case Xt = X^"=i ^i^i 
where the T is a subset of [—1,1]" and the independent and centered 
random variables satisfying 



(7) 



E 



y k>2 



which implies (6) with v'^ = v'^{t) = [tl^cr"^- Unfortunately, it turns that 
the result by Bousquet provides an analogue of (2) with v'^ replaced by na'^ 
although one would expect the smaller quantity v'^ = sup^^j^ (t) . 

1.2. Chi-square type random variables and model selection. Origi- 
nally, this result by Bousquet above was motivated by a statistical applica- 
tion. In order to give an account of how such processes arise in Statistics, 
consider the problem of estimating / from the observation of the random 
vector y = / -|- ^ in M". Given a linear subspace S of M", the classical 
least-squares estimator of / in 5 is given by / = H^F = Hsf + where 
denotes the orthogonal projector onto S. Since the Euclidean (squared) 

distance beween / and / decomposes as / — / = |/ — H^/lg -|- |Il5^|2, the 



study of the quadratic loss 



requires that of its random component 
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in^^lg. This quantity is usually called a x^'tyP^ variable by analogy to the 
Gaussian case. Its study is connected to that of Z by the formula 

n 

where T is countable and dense subset of the (Euclidean) unit ball of S. The 
control of such random variables is fundamental to perform model selection 
from the observation of Y in the regression setting. When the admit few 
finite moments only, a control of such a Z can be found in Baraud (2000) 
by mean of a Rosenthal's type inequality. By using chaining techniques, 
Baraud, Comte k. Viennet (2001) handled the case of sub-Gaussian ^j. The 
Gaussian case was studied by Birgc & Massart (2001) by using the con- 
centration Inequality (1). More recently, Sauve (2008) considered which 
satisfy (7). She discussed the fact that the inequality obtained in Bous- 
quet (2003) was unfortunately inadequate for controlling |n5'^|2 and she 
solved the problem when S consists of vectors the components of which are 
constant on each element of a given partition. 

1.3. What is this paper about? In this paper, our motivations are twofold 
First, we present an exponential bound for the probability of deviation of 
Z = supjg'ji Xf under a suitable bound on the Laplace transform of the incre- 
ments Xt — Xs with s,t & T. Our approach is inspired by that described in 
the book of Talagrand (2005) for evaluating the expectations of suprema of 
random variables. Talagrand's approach relies on the idea of decomposing T 
into partitions rather than into nets as it was usually done before. By using 
such a technique, the inequalities we get suffer from the usual drawback that 
the numerical constants are non-optimal but at least they allow a suitable 
control of x^-type random variables over more general linear spaces S than 
those considered in Sauve (2008). Second, we shall apply these inequalities 
for the purpose of selecting an appropriate least-squares estimator among 
a (possibly exponentially large) collection of candidate ones. If one excepts 
the case of histogram-type estimators, it seems that performing model selec- 
tion in this context under the assumption that the errors satisfy (7) is new. 
Besides, unlike Sauve (2008), our estimation procedure does not assume that 
an upper bound for the sup-nom of the regression function is known. 

The paper is organized as follows. We present our deviation bound for Z 
in Section 2. We give an application to Statistics in Section 3. We perform 
model selection for the purpose of estimating the mean of a random vector. 
We shall restrict there to collections of models based on linear spans of 

piecewise or trigonometric polynomials. The case of more general linear 
spaces will be considered in Section 4. Section 5 is devoted to the proofs. 

Along the paper we shall assume that n > 2 and use the following notations. 
We denote by ei , . . . , e„ the canonical basis of M" which we endow with the 
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Euclidean inner product denoted (.,.). For x G M", we set 

n 

\x\2 = \J {x, x), \x\i = \xi\ and |x|oo = max \xi\. 

' i=l,...,n 

i=l 

The linear span of a family iti, . . . , Wfe of vectors is denoted by Span{tti, . . . , Uk}- 
The quantity is the cardinality of a finite set /. Finally, k denotes the 
numerical constant 18. It appears in the control of the deviation of Z when 
applying Talagrand's chaining argument. As a consequence, it will appear 
all along the paper and it seems to us interesting to stress up how this 
constant is involved in the statistical procedure we propose. 

2. A Talagrand-type Chaining argument for controlling 

SUPREMA OF RANDOM VARIABLES 

Let {Xt)^^j^ be a family of real valued and centered random variables indexed 
by a countable and nonempty set T. Fix some to hi T and set 

Z = sup {Xt - XjJ and Z = sup \Xt - Xt^l . 
teT teT 

Our aim is to give a probabilistic control of the deviations of Z (and Z). 
We make the following assumptions 

Assumption 1. There exists two distances d and 6 onT and a nonnegative 
constant c such that for all s,t gT (s ^ t) 

X'^d^{s,t) 



(8) E 



< exp 



_2(1-Ac(5(s,t)) 
with the convention 1/0 = +oo. 



VA G 



0, 



cS{s, t) 



The case c = corresponds to the situation where the increments of the 
process Xf are sub-Gaussian. 

In this section, we also assume that d and S derive from norms. This is the 
only case we need to consider to handle the statistical problem described in 
Section 3. Nevertheless, a more general result with arbitrary distances can 
be found in Section 5. 

Assumption 2. Let S be a linear space S with dimension D < +oo endowed 
with two arbitrary norms denoted \\ ||2 and \\ ||oo respectively. The set T is 
a subset of S and for all s,t E T, d{s, t) = \\t — s\\2 and d{s, t) = \\s — t\\oo- 
Besides, 

T c{t£S \ \\t- to\\2 < V, c\\t - iolloo < b} . 

Then, the following result holds. 
Theorem 3. Under Assumptions 1 and 2, 



(9) P Z>K [^v'^{D + x) + b{D + x)) 
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with K = 18. Moreover 



(10) 



Z>K (^^/v^{D + x) + b{D + x)) 



< 2e~ 



Vx > 0. 



If T is no longer countable but admits a countable dense subset T' (with 
respect to || ||2 II ||cxD5 both norms being equivalent on S*) and if the paths 
t ^-y Xt are continuous with probability 1, Theorem 3 still holds since 



sup {Xt 
teT 



sup {Xt 

teT' 



Xt,) 



a.s.. 



Let us now turn to some examples. In the sequel, we take Iq = 0, T C M" 
and Xt = {^,t) where the random vector ^ = (^i, . . . ,^n) has independent 
and centered components. 



Comparison with the (sub) Gaussian case. Assume that for some a > 



(11) 



max log E 



< 



VAg 



This assumption holds when the are Gausian with mean and variance 
or when the are bounded by a for example. Consider some linear subspace 

5 of with dimension D and T the Euclidean ball of S centered at of 
radius r > 0. It follows from (11) that Assumptions 1 and 2 hold with c = 0, 

6 = 0, d{s, i) = ||t — s||2 = a|t- 
from Theorem 3 the inequality 



s\2 and V = ar. On the one hand, we obtain 



(12) P 



Z > nar 



< 



Z > KarVD + 



X 



< e-^, Vx > 0. 



In view of commenting this bound, let us compare it to Inequality (1) when 
the are Gaussian. In this case, sup^gjn var(Xt) = a^r^ and since Z'^/{ar)'^ 
is a random variables with D degrees of freedom, K{Z) < E^/^(Z^) < 



aryD. Hence, Inequality (1) give, on the other hand. 



< e" 



Z>ar (Vd + v^) 



Except for the numerical constant k, we see that this bound is comparable 
to (12). One could argue that the original bound (1) is better since we 
have replaced E{Z) by the upper bound ary/D but in fact, it can easily be 
checked that this quantity gives the right order of magnitude of E(Z) since 
E{Z) > arV2TT-^D. 

Comparison with Inequalities (4) and (1). Assume now that ^ satisfies for 
some positive numbers u and c, 

AV2 



(13) 



max log E 

i=l,...,n 



< 



VA G (-l/c,l/c). 



2(1 - |A|c)' 

As a first simple example, let us take S = Spanjl} where 1 = (1, . . . , 1)' G 
M" and T = {Al, A G [-1,1]}. Under (13), Assumptions 1 and 2 hold 
with d{s,t) = ||s — t||2 = cr\t — s\2, S{s,t) = 



s i||oo — p t\oo ~ 
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maxi=i^...^„ \si — ti\, = n and b = c. We can therefore apply Theorem 3 
and get, 

(14) P [Z > K (Vn(l +x)cj2 + c(l + : 



< e"^, Vx > 0. 

On the other hand, for such a set T, Z is merely |(^, 1)| = and by 

using Bernstein's Inequality (4) twice (with ^ and — ^) and u = x + log(2), 
we derive 

P Z > Vn(log(2) + x)cr2 + c(log(2) + x) < e"^, Vx > 0. 
This bound is comparable to (14). 

Let us now take S as any linear subspace of M" of dimension D, 
T={teS\ \\t\\2 < V, C\\t\\oo < 1} 

and assume cr = 1 for simplicity. When < c for all i, we can compare our 
Inequality (9) to that of Klein Sz Rio (Inequahty (2)) since the assumptions 
of Theorem 1 and 3 arc both satisfied. On the one hand, the inequality by 
Klein & Rio gives that with probability at least 1 — e~^, Z < z{x) where 

z{x) = E(Z) + vT2^^^T2^(Z))x + 2cx. 

The concavity of log together with the elementary inequality 2ab < + 
lead to the following upper and lower bounds for z{x) 

E{Z) + V2v'^x + cx< z{x) < 3 (e{Z) + V2v'^x + cx) 

On the other hand, our inequality gives that with probability at least l — e~^, 
Z < Kw{x) where 

w{x) = \A5%DTx) + c{D + x) 
and similar computations yield 

+ cD + V v^x + cx^ < w{x) < + cD + V v^x + ex. 

Except for the numerical constants, we sec that the main difference between 
Klein &; Rio's Inequality and ours essentially lies in the fact that E(Z) is 
replaced by = V-Du^ + cD. It follows from Cauchy-Schwarz's Inequality 
that 

E(Z) < ^flh^ <E = Vd^ + cD, 

showing that our bound w(x) involves an upper bound for E(Z). Under the 
only assumption that ^ satisfy (13), the problem of replacing E by E(Z) 
remains open. Nevertheless, the term V Dv^ turns to be of order E(Z) in 
typical situations (think of the Gaussian case) and our bound becomes then 
comparable to that given by Klein &: Rio as soon as c^D < . This turns 
to be enough to derive deviations bounds for x^-type random variables in 
many situations of interest as we shall see in Section 5.3. 
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3. An application to model selection in the regression 

framework 



Let y be a random vector of M" with independent components. In this 
section, our aim is to estimate / = E{Y) under the assumption that the 
components of the noise ^ = Y — f satisfy 

(15) logE[e^g'] < 2(i^.^'^A|c) ' ^^^(-Vc,!/^), i = l,...,n 

for some known positive numbers a and c. Inequahty (15) holds for a large 
class of distributions (once suitably centered) including Poisson, exponential, 
Gamma... Besides, (15) is fulfilled when the satisfy (7). 

Our estimation strategy is based on model selection. We start with a (pos- 
sibly large) collection {Sm, m G M} of linear subspaces (models) of and 

associate to each of these the least-squares estimators fm = Il^^y. Given a 
penalty function pen from Ai to M_|_, we define the penalized criterion crit(.) 
on M. by 

. 2 

(16) crit(m) = Y — +pen(m). 

In this section, we propose to establish risk bounds for the estimator of / 
given by where the index m is selected from the data among M as any 
minimizer of crit(.). 

In the sequel, the penalty pen will be based on some a priori choice of 
nonnegative numbers {A^, m e M.} ior which we set 

S = ^ e'^"" < +00. 

When S = 1, the choice of the can be viewed as that of a prior distri- 
bution on the models Sm- For related conditions and their interpretation, 
sec Barron and Cover (1991) or Barron et al (1999). 

In the following sections, we give an account of our main result (to be pre- 
sented in Section 4.2) for some typical collections of linear spaces {Sm, rn G M.}. 



3.1. Selecting among histogram-type estimators. For a partition m 
of {1, . . . , n}, Sm denotes the linear span of vectors of M" the coordinates of 
which are constants on each element / of m. In the sequel, we shall restrict 
to partitions m the elements of which consist of consecutive integers. 

Gonsider a partition tn of {1, . . . , n} and M a collection of partitions m such 
that Sm C Sm- We obtain the following result. 

Proposition 1. Let a,b> 0. Assume that 

(17) \I\>a'^log^{n), V/Gtn. 
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If for some K > 1, 
(18) pen(m) > Kk'^ (^ct^ + 2c 

the estimator fm satisfies 



(a + c)(6 + 2) 



an 



\m\ + Am), ymeM. 



(19) E 



f-f^ 



<C{K) 



inf 



E 



+ pen(m) 



where C{K) is given by (25) and 

{c+a){b + 2] 



R = K^{a'^ + 2c- 



an 



S + 2 



(c + a)^(6 + 2)- 



Note that when c = 0, InequaHty (18) holds as soon as 

(20) pen(m) = i^K^cr^ (|m| + A^) , 'imeM. 

Besides, by taking a = log~''^(n) we see that Condition (17) becomes auto- 
matically satisfied and by letting h tend to +oo, Inequality (19) holds with 
pen given by (20) and R = k^ct^S. 

The problem of selecting among histogram-type estimators in this regres- 
sion setting has recently been investigated in Sauve (2008). Her selection 
procedure is similar to ours with a different choice of the penalty term. Un- 
like hers, our penalty does not involve an upper bound M (assumed to be 
known) on |/| 



3.2. Families of piecewise polynomials. In this section, we assume that 
/ is of the form (F(l/n), . . . , ^(n/n)) where F is an unknown function 
on (0, 1]. Our aim is to estimate F by an estimator which is a piecewise 
polynomial of degree not larger than d based on a data-driven choice of a 
partition of (0, 1]. 

In the sequel, we shall consider partitions m of {1, . . . , n} such that each el- 
ement I E m consists of at least d+1 consecutive integers. For such a parti- 
tion, Sm denotes the linear span of vectors of the form (P(l/n), . . . , P{n/n)) 
where P varies among the space of piecewise polynomials with degree not 
larger than d based on the partition of (0, 1] given by 



min 7 — 1 max 7 



n 



n 



, I E m 



Consider a partition m of {1, . . . , n} and M a collection of partitions m such 
that Sm C Sm- We obtain the following result. 

Proposition 2. Let a,b> 0. Assume that 

(21) \I\>{d+l)a'^log^{n)>d+l, V7 e m. 
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If for some K > 1, 

pen(m) > K.^ ( + c ^^^" + ^^^^ + + ) (D^ + A^) , e M. 
\ an I 

the estimator fra satisfies (19) with 



, an I a?nP 

3.3. Families of trigonometric polynomials. As in the previous section, 
we assume here that / is of the form . . . , F{xn)) where Xj = i/n for 

z = 1, . . . , n and F is an unknown function on (0, 1]. Our aim is to estimate 
F by a trigonometric polynomial of degree not larger than some D > 0. 

Consider the (discrete) trigonometric system {^j}j>o of vectors in M" de- 
fined by 

00 = (l/v^,...,l/V^) 

(t)2j-i = (cos (27rjxi) , . . . , cos (27rjxi)) , Vj > 1 

hi = \ - (sin (27rjxi) , . . . , sin (27rjxi)) , Vj > 1. 
V n 

Let M he a family of subsets of |0, . . . , 2Z)}. For m G M, wc define Sm as 
the linear span of the (pj with j G m (with the convention Sm = {0} when 
m = 0). 

Proposition 3. Lei a,b> 0. Assume that 2D + 1 < i/n/(alog(ri)). ///or 
some > 1, 

2 / 2 , 4c(c + a)(5 + 2) 



pen(m) > K/c^ |^cr^ + ^ -j {Dm + A^) , e M 

then fm satisfies (19) with 

a2(2D + l)n'' 



4. Towards a more general result 

We consider the statistical framework presented in Section 3 and give a 
general result that allows to handle Propositions 1, 2 and 3 simultaneously. 
It will rely on some geometric properties of the linear spaces 5^ that we 
describe below. 
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4.1. Some geometric quantities. Let 5" be a linear subspace of M". We 
associate to S the following quantities 

(22) A2(S') = max |n5ei|2 and Aoo(5') = max lllgeili. 

i=l,...,n i=l,...,n 

It is not difficult to see that these quantities can be interpreted in terms of 
norm connexions, more precisely 

A2(5)= sup and Aoc{S) = sup 

teS\{Q} teiR"\{o} Floo 

Clearly, A2(S') < 1. Besides, since \x\^ < ■\/n\x\2 for all x G M", Aoo(-S') < 
^ynA2{S). Nevertheless, these bounds can be rather rough as shown by the 
following proposition. 

Proposition 4. Let P he some partition of {1, . . . ,n}, J some nonempty 
index set and 

{<t>j,i, {j,I)eJxP} 
an orthonormal system such that for some $ > and all I E P 

sup |0j,/loo ^ — ^ and {(pjj, Ci) = Vi I. 
jeJ Vl-'l 



// S is the linear span of the (f)jj with (j, I) E J x P, 

Ai(5) < ( Jj^^^) A 1 and A^{S) < {\J\^^) A (^/^A2(5)) . 

Proof of Proposition 4- We have already seen that A2(S') < 1 and Aoo{S) < 
■\/nA2{S), so it remains to show that 

AliS) < — r- and Aoo(5) < IJI^^. 
mm/gp |/ 1 

Let i = 1, . . . ,n. There exists some unique I e P such that i e I and since 
(</'j,7',ei) = 0forall/'/^ 

jeJ 

Consequently, 

ITT |2 Y-/ ^ |J|$2 



and 



\^sei\, = Yl 
i'ei 



J2 

jeJ 

We conclude since i is arbitrary. □ 



<|/|^<|J|$^. 
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4.2. The main result. Let {Sm, rn G M.} be family of linear spaces and 
{A^, m G M} a family of nonnegative weights. We define Sn = J2meM ^'^ 
and 

Aoo = I sup Koo{Sm + Sm') ) V 1. 

Theorem 4. Let K > 1 and z > 0. Assume that for all i = l,...,n, 
Inequality (15) holds. Let pen he some penalty function satisfying 

(2cu\ 
(T^ + — J {Dm + A^) , Vm e At 

where 

(24) u={c + (t) Aoo A2 {Sn) log(n2e^) . 

If one selects m among Ai as any minimizer o/crit(.) defined by (16) then 



E 



f-h 



where 

(25) 

and 



<C{K) 



C{K) 



inf I E 



f-fn 



+ pen(m) ) + R 



K{K^ + K -1) 
(i^- 1)3 



R 



2cu 



E + 2 



u 

Aor 



When c = we derive the following corollary by letting z grow towards 
infinity. 

CoroUsiry 1. Let K > 1. Assume that the for i = l,...,n satisfy 
Inequality (15) with c = 0. // one selects m among M. as a minimizer of 
crit defined by (16) with pen satisfying 



then 
E 
where 



f-frr 



pen(m) > KK^a^ {Dm + A^) , \/m e M 
K{K'^ + K-l) . 



< 



inf E 



{K - 1)3 meM 



f-fn 



+ pen(m) ) + R 



R = 



{K - 1)2 



S. 
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5. Proofs 

We start with the following result generalizing Theorem 3 when d and 5 are 
not induced by norms. We assume that T is finite and take numbers v and 
b such that 

(26) supd(s,io) < V, swp c6{s,to) < b. 
seT seT 

We consider now a family of finite partitions {Ak)]^>o of T, such that Ao = 
{T} and for A; > 1 and A e Ak 

d{s,t) < and c5{s,t) < 2-'=6, Vs,i G A. 

Besides, we assume Ak C Ak-i for all A: > 1, which means that all elements 
A e Ak are subsets of an element of Ak-i- Finally, we define for > 

Nk = lA+illAI- 

Theorem 5. Let T be some finite set. Under Assumption 1, 

(27) ¥{z>H + 2y/2v'^x + 2bx^ < e"'', Vx > 
where 

H = J22-^ ( v^2\og{2k+^ Nk) + b \og{2^+^ Nk)\ • 
ifc>o ^ ^ 

Moreover, 

(28) ¥{^>H + 2^2v'^x + 2bx^ < 26"'', Vx > 0. 

The quantity H can be related to the entropies of T with respect to the 

distances d and cS (when c 7^ 0) in the following way. We first recall that 
for a distance e(.,.) on T and £ > 0, the entropy H{T,e,e) is defined as 
logarithm of the minimum number of balls of radius e with respect to e 
which arc necessary to cover T. Note that for A; > 0, each element A of the 
partition Ak+i is a subset of both a ball of radius 2^^''^^^v with respect to d 
and of a ball of radius 2~^^~^^^b with respect c6. Besides, since |v4fc+i| < A^^, 
we obtain that for all e G [2-^''+^\2-^) 

H{T,e) = uidx{H{T,ev),H{T,c5,eb)} < log{Nk). 

By integrating with respect to £ (and using (26)), we deduce that 

r+00 

j \^-^2v'^H{T, e) + bH{T, e)j de < H. 
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5.1. Proof of Theorem 5. Note that we obtain (28) by using (27) twice 
(once with Xf and then with —Xt). Let us now prove (27). For each k > 1 

and A G Ak, wc choose some arbitrary clement tk{A) in A. For each t £ T 
and k > 1, there exists a unique A G Ak such that t & A and we set 
T^k{t) = tk{A)- When A; = 0, we set 7ro(t) = to. 

We consider the (finite) decomposition 

k>0 

and set for A; > 

Zk = 2-^ (^^2 {\og{2k+^Nk)+ x) + 6 (\og{2^+^Nk) + x)^ 

Since X^fc>o zj^ < z = H + 2v\f2x + 2&a:, 

P (Z > z) < P (3t, 3/c > 0, - > Zk) 

^ ^iXu-x,>zk) 

k>0 {s,u)eEk 

where 

Ek = {i7rk{t),7rk+i{t))\tGT}. 

Since .4jk+i C Ak, TTk{t) and 7r;fc_|_i(t) belong to a same element of Ak and 
therefore d{s,u) < 2~^v and c5{s,u) < 2~^b for all pairs {s,u) G Ek- 
Besides, under Assumption 1, the random variable X = X^ — Xg with 
(s, u) G Ek is centered and satisfies (6) with 2~''v and 2~'^6 in place of v and 
c. Hence, by using Berstein's Inequality (4), we get for all {s,u) G Ek and 
k>0 

F (X„ -Xs> Zk) < 2^('=+^)iV-^e-^ < 2-('=+i)|Eferie-^. 

Finally, wc obtain Inequality (27) summing up this inequalities over (s, u) G 
Ek and /c > 0. 

5.2. Proof of Theorem 3. We only prove (9), the argument for prov- 
ing (10) being the same as that for proving (28). For t G S and r > 0, we 
denote by B^it, r) and Soo(t, r) the balls centered at t of radius r associated 
to II II2 and II 1 1 00 respectively. In the sequel, we shall use the following result 
on the entropy of those balls. 

Proposition 5. Let \\ \\ be an arbitrary norm on S and B{0, 1) the corre- 
sponding unit ball. For each 6 G (0, 1], the minimal number M{S) of balls of 
radius 5 (with respect to \\ \\) which are necessary to cover 5(0, 1) satisfies 

Af{5) < (1 + 25-1)^. 

This lemma can be found in Birge (1983) (Lemma 4.5, p. 209) with a proof 
referring to Lorentz (1966). Nevertheless, we provide a proof below to keep 
this paper as self-contained as possible. 
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Proof. With no loss of generality, we may assume that S = R^. Let 6 G 
(0, 1]. A subset T of B{0, 1) is called (5-scparated if for all s,t e T, \\s — 1\\ > 
6. If T is (5-separated, the family of (open) balls centered at those t £ T 
with radius 6/2 are all disjoint and included in the ball -8(0, 1 + S/2). By a 
volume argument (with respect to the Lebesgue measure on M^), we deduce 
that T is finite and satisfies \T\ < (1 + 2d~^)^. Consider now a maximal 
5-separated set T, that is 

ITI = max IT I 

r' ' ' 

where T' runs among the family of all the 5-separated subset of -8(0, 1). By 
definition, for alH G -8(0, 1) \ T, 7" U {t} is no longer a S-net and therefore 
that the family of balls {B{t, S), t e T} covers -8(0, 1). Consequently 

^fis) < \T\ < {1 + 2S-Y- 

□ 

Let us now turn to the proof of (9) . Note that it is enough to prove that for 
some u < H + 2\j2v^x + 26a; and all finite sets T satisfying Inequalities (8) 
and (26) 



sup(Xt -Xto) > u < e ^. 

vteT / 

Indeed, for any sequence iTn)n>Q of finite subsets of T increasing towards 
r, that is, satisfying r„ C r„+i for all n > and Un>o ~ ^^^^ 

sup {Xt -Xfo) > u 

increases (for the inclusion) towards {Z > u}. Therefore, 



{Z>u)= lim P sup {Xt - Xto) >u] . 



teT„ 



Consequently, we shall assume hereafter that T is finite. 

For k > and j G {2, oo} define the sets Aj,k as follows. Wc first consider 
the case j = 2. For k = 0, A2fl = {T}. By applying Proposition 5 with 
II II = II II2/V and (5=1/4, we can cover T C B2{to, v) with at most 9^ balls 
with radius v/A. From such a finite covering {-81, . . . ,-8jv} with N < 9^, 
it is easy to derive a partition A2.1 of T by at most 9^ sets of diameter 
not larger than v/2. Indeed, A2,i can merely consist of the non-empty sets 
among the family 

Bk\ \J Bi]nT, k = l,...,N 

i<e<k 

(with the convention IJ^, = 0). Then, for k > 2, proceed by induction using 
Proposition 5 repeatedly. Each element A G ^2,fc-i is a subset of a ball of 
radius 2~''v and can be partitioned similarly as before into 5^ subsets of 
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balls of radii 2^^'^'+^)^ . By doing so, the partitions A2,k with k > 1 satisfy 
A2,k C A2,k-i, \A2,k\ < (1-8)^ X 5^=^ and for all A e A2,k, 

sup ||s — t\\2 < 2~^V. 

s,t<^A 

Let us now turn to the case j = +oo. If c > 0, define the partitions Aoo,k 
in exactly the same way as we did for the A2,k- Similarly, the partitions 
"4oo,fe with A; > 1 satisfy Aoo,k C .4oo,fc-i, |.4oo,fe| < (1-8)^ x h^^ and for all 

A G Aoo^ki 

sup c\\s — t\\oo < 2~*^6. 

When c = 0, we simply take Aoo,k = {T} for all A; > and note that the 
properties above are fulfilled as well. 

Finally, define the partition Ak for > as that generated by A2,k ^^id 
^oo,fc, that is 

Ak = {A2 n ^oo| ^2 G A2,k, ^00 G Aoo,k} ■ 

Clearly, Ak+i C Ak- Besides, |^o| = 1 and for A; > 1, 

|A|<IAfc||^oo,fc|< (1.8)2^x52*=^. 

The set T being finite, we can apply Theorem 5. Actually, our construction 
of the Ak allows us to slightly gain in the constants. Going back to the proof 
of Theorem 5, we note that 

l^fcl = \{Mt),nk+i{t))\teT}\< lA+il <9''' xs^'^^ 

since the clement 7rk+i{t) determines TTk{t) in a unique way. This means 
that one can take Nk = 9^^ x 52^=^ in the proof of Theorem 5. By taking 
the notations of Theorem 5, we have, 

H < ^2-'^ ^;^21og(2fc+i x 9^^ x 52^-0) + ^log (^2^+1 x 9^^ x S^'^-^) 
fe>o L 

< uVDv^ + 18Db 

and using the concavity oi x -s/x, we get 

H + 2V2v'^x + 2bx < uVd^ + 2V2v^x + 18b{D + x) 

< 18 ( ^^2 (D + x) + b{D + x)) . 

which leads to the result. 

5.3. A control of x^-type random VEiriables. We have the following 
result. 

Theorem 6. Let S he some linear subspace o/R" with dimension D. If the 
coordinates of ^ are independent and satisfy (15), for all x,u> 



(29) 



|n5^|2 >n'[a^ + — \{D + x), < u 



K 



< e 
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with K = 18 and 

(30) P(|n5Cloo>^)<2nexp 
where A2(S') is defined by (22). 



2Al{S){a^ + cx) 



Proof. Let us set x = |nsC|2. For t G S, let Xt = {^,t) and to = 0. It 
follows from the independence of the and Inequality (15) that (8) holds 
with d{t, s) = a\t — s]2 and 5{t, s) = \t — s\oo, for all s, t £ S. The random 
variable x equals the supremum of the Xt when t runs among those elements 
t of S satisfying |t|2 < 1. Besides, the supremum is achieved for i = Hs^/x 
and thus, on the event {% > z, iH^^loo < u} 

X = supXt with T= {teS, \t\2 < 1, |i|oo < uz''^} 
teT 

leading to the bound 



'{X>z, in^^loo <u)<F( supXt > z 

.teT 



We take z = K\/{a'^ + 2cuK ^){D + x) and (using the concavity of x i— > \/x 
note that 



> K (^^Ja'^{D + x) + cuz'^iD + x)) . 



Then, by applying Theorem 3 with v = a, b = cu/z, we obtain Inequal- 
ity (29). 

Let us now turn to Inequality (30). Under (15), we can apply Bernstein's 
Inequality (4) to X = {^,t) and X = { - (,t) with t & S, = (T^\t\l and 
c|t|oo in place of c and get for alH G S* and a; > 



(31) 



'(|(e,i)l >3;) <2exp 



2 [u'^\t\l + c\t\^x) 
Let us take t = n^ej with i G {1, . . . , n}. Since \t\2 < A.2{S) and 

\t\oo= niax \{nsei,ei>)\ = max \{Ilsei,Usei')\ < AI{S), 

i,i'=l,...,n i,i'=l,...,n 

we obtain for alH G {1, . . . , n} 

¥{\{Us^,ei)\>x) < 2exp 



2Al{S) (a^ + cx) 



We obtain Inequality (30) by summing up these probabilities for i = 1, . . . , n. 

□ 
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5.4. Proof of Theorem 4. Let us fix some m G Ai. It follows from simple 
algebra and the inequality crit(m) < crit(m) that 



f-fn 



< 



f-fn 



+ 2(C, fm - fm) + pen(m) - pen(m) 



Using the elementary inequality 2ah < + b'^ for all a, 6 G M, we have for 
K >1, 

2(C) fm - fm) < fm - fm ^ 1115^+5^^12 

. 2 



< K 



-1 



< K- 



fm fi 
1 + 



2 

K-1 



+ i^|n5^+5^Cl2 



and we derive 



{K-l) 



f-fn 



K 



fm f 



+ 1 + 



K 



K-1 



f-f. 



< 



< 



K{K -1) 
K{K- 1) 



f-fn 



f-fn 



+ K \Us^+s^^\2 - (pen(m) - pen(m)) 
+ pen(m) 



-K \Us^+s^(,L - (pen(m) + pen(m)) . 



Setting 

Ai{m) = KK^[a'^ + 



2cu\ ( |n5„+5^ci2 



and using (23), wc deduce that 



-D^-Dm-/^ra-^mj 1 {\Jls^+sM^ < u} 



{K-l) 



f-fri 



< 



K^ + K -1 
K{K-1) 



f-fn 



+ pen(m) + Ai{rh) + A2{m), 



and by taking the expectation on both side we get 



E 



f-fv 



K'^ + K-l 
< ,,,,, —E 



f-frr 



+pen(m)+E [Ai(m)]+E [^2(m)] . 



K{K - 1) 

The index m being arbitrary, it remains to bound Ei = E[Ai{'m)] and 
E2 = 'E [A2{rh)] from above. 

Let m' be some deterministic index in M. By using Theorem 6 with 
S = Sm + Sm' the dimension of which is not larger than + D^i and 
integrating (29) with respect to x we get 

E [A{m')] < Kk^ (a^ + e"^-"^-' 
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and thus 

m'eM ^ ^ 

Let us now turn to E [A2(m)]. By using that Sm + Sm C 5„, |n5^_|_5^^|2 < 
|n5^^l2 < n in^^^l^. Besides, it follows from the definition of that 



and therefore, setting xq = A^u 



E2 < KnE 



\UsJ\ll{\UsJ\^>xo} 



We shall now use the following lemma the proof of which is deferred to the 
end of the section. 

Lemma 1. Let X he some nonnegative random variable satisfying for all 
x > 0, 

2 

(32) P(X>x)<aexp[-(^(x)] with cf>{x) = ^^^^ 
where a,a > and P >0. For xq > such that (p{xo) > 1, 



E 



[Xn {X > xo}] < axge-'^(-°) (^1 + , Vp > L 



We apply the lemma with p = 2 and X = \^Sn^\oo which we know 
from (30) that (32) holds with a = 2n, a = Al{S)a^ and (3 = A|(5)c. 
Besides, it follows from the definition of xq and the fact that n > 2 that 

= 2Ai(S)(l + c.o) ^'°^ 

The assumptions of Lemma 1 being checked, we deduce that E2 < 2KxQe~^ 
and conclude the proof putting these upper bounds on Ei and E2 together. 

Let us now turn to the proof of the lemma. 
Proof of Lemma 1. Since 

r+co 

E [Xn {X > xo}] < xgP {X >xo)+ I px^-^F {X > x) dx, 

it remains to bound from above the integral. Let us set 

f +00 



r+00 

Ip= pxP-^e-'t'^''^dx. 
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Note that 0' is increasing and by integrating by parts we have 

f+OO 1 



,p-V'^(-°) + (p-i)Vi 



By induction over p and using that a;o0'(a;o) > </>(a;o) > 1 we get 



□ 



5.5. Proof of Proposition 1. Let m be some partition of {1, ... , n}. By 

applying Proposition 4 with J = {1}, P = m and $ = 1, we obtain 

AliSm) < ■ V,| and AooiSm) < 1- 

In fact, one can check that these inequalities are equalities. Since for all 
m G M, Sm C Sxn, we deduce that under (17) 

KliSn) < AUSm) < \, , 

log (n) 

For two partitions m, m' of {1, . . . , n}, define 

(33) m\Jm' = {inl'l I em, I' em'} . 

Since the elements of m, m' for m,m' e M consist of consecutive integers 
Smvm' = Sjn + Sm' and therefore 

Aoo = sup Aoo{Sm + Sm')= SUp Aoo('S'r„Vm') = 1- 
m,m'EM m,m'EM 

The result follows by applying Theorem 4 with z = 61og(n). 

5.6. Proof of Proposition 2. Let m be a partition of {1, . . . , n} such that 
for all I £ m, I consists of consecutive integers and \I\ > d. As proved 
in Mason &; Handscom (2003), an orthonormal basis of Sm is given by the 
vectors (f)jj defined by 

{(t>o,i,ei) = —jF^hii) 



and for J = 1 , . . . , d 

\ [^n f f {i-^inI + l/2)TT \\ ^.^ 
{(l>j,i,ei) = J jj-^Qj I cos I 1 1 l/(z) 

where Qj is the Chebyshev polynomial of degree j defined on [—1, 1] by the 
formula 

Qj{x) = cos{j9) if a; = cos 9. 
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By applying Proposition 4 with <J> = \/2, P = m and J = {0, . . . , d} and get 

^l{Sm)< ^/"^^Vn and Aoo(5„)<2(d+1). 
Since for those m e M, Sm CI Sm, Sn = ^meM ^'^ and therefore 

Ai(5n) < Ai(5^) < ^ 



a? log^(n) 

Moreover, since for the elements of m and m! for m,m' € 7W consist of 
consecutive integers Sm + Sm' = Smvm' with m V m' is defined by (33) and 

sup Aoo('S'm + Sm') = SUp Aoo(S'mVm') < 2(d + 1) 

which implies that Aqo < 2{d + 1). It remains to apply Theorem 4 with 
z = 61og(n). 

5.7. Proof of Proposition 3. Let m = (O, . . . ,2£)}. Under the assump- 
tion that 2D + 1 < Y^/(a log(n)), for all m C m, the family of vectors 
{^iijem a orthonornial basis of Sm- By applying Proposition 4 with P 
reduced to {{1, . . . , n}}, J = m, ^ = -\/2, we get 

^l{Sm) < and Aoo{Sm) < VnA2{Sm) < V2\m\- 

Since for all m E M, Sm C S^, <5n = ^m&M ^ '^m and therefore 

AliSn) < AliSra) < 
Moreover, for all m,m' e M, Sm + Sm' = Smum' with mUm' C tn and thus, 

AooiSm + Sm') < V2(|mUm'| < ^j2{2D + l). 
It remains to apply Theorem 4 with z = 61og(n). 
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